-
Notifications
You must be signed in to change notification settings - Fork 1.5k
re-enable sort_query_fuzzer_runner
#16491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Context for anyone interested: #16452 (comment) |
@@ -55,6 +55,7 @@ apache-avro = { version = "0.17", default-features = false, features = [ | |||
arrow = { workspace = true } | |||
arrow-ipc = { workspace = true } | |||
base64 = "0.22.1" | |||
chrono = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary until the upstream bug gets fixed in arrow, plus it's necessarily already in the dependency tree because arrow uses it.
I think with these fixes to I used this script to test: #!/usr/bin/env python3
import argparse
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from threading import Event
def run_test(command, run_num, total_runs, stop_event):
"""Run a single test and return result"""
if stop_event.is_set():
return run_num, "SKIPPED", None
try:
result = subprocess.run(command, shell=True, capture_output=True, text=True)
status = "PASS" if result.returncode == 0 else "FAIL"
print(f"Run {run_num}/{total_runs}: {status}")
return run_num, status, result
except Exception as e:
print(f"Run {run_num}/{total_runs}: ERROR - {e}")
return run_num, "ERROR", None
def main():
parser = argparse.ArgumentParser(description="Run a command multiple times and report failure rate")
parser.add_argument("-P", "--parallel", type=int, default=1, help="Number of parallel jobs (default: 1)")
parser.add_argument("-n", "--runs", type=int, default=100, help="Number of runs (default: 100)")
parser.add_argument("-x", "--stop-on-failure", action="store_true", help="Stop at first failure")
parser.add_argument("command", nargs=argparse.REMAINDER, help="Command to run")
args = parser.parse_args()
command = " ".join(args.command)
print(f"Running command {args.runs} times with {args.parallel} parallel jobs...")
print(f"Command: {command}")
print("----------------------------------------")
stop_event = Event()
failures = 0
completed_runs = 0
failure_outputs = []
with ThreadPoolExecutor(max_workers=args.parallel) as executor:
# Submit all jobs
futures = []
for i in range(1, args.runs + 1):
future = executor.submit(run_test, command, i, args.runs, stop_event)
futures.append(future)
# Process results as they complete
for future in as_completed(futures):
run_num, status, result = future.result()
completed_runs += 1
if status == "FAIL" or status == "ERROR":
failures += 1
if result and (result.stdout or result.stderr):
failure_outputs.append((run_num, result.stdout, result.stderr))
if args.stop_on_failure:
print(f"Stopping at first failure (run {run_num})")
stop_event.set()
# Cancel remaining futures
for f in futures:
f.cancel()
break
print("----------------------------------------")
print("Results:")
print(f"Total runs: {completed_runs}")
print(f"Failures: {failures}")
print(f"Passes: {completed_runs - failures}")
if completed_runs > 0:
failure_rate = (failures * 100) / completed_runs
print(f"Failure rate: {failure_rate:.2f}%")
else:
print("Failure rate: 0%")
# Print failure outputs
if failure_outputs:
print("\n" + "="*50)
print("FAILURE OUTPUTS:")
print("="*50)
for run_num, stdout, stderr in failure_outputs:
print(f"\n--- Run {run_num} ---")
if stdout:
print("STDOUT:")
print(stdout)
if stderr:
print("STDERR:")
print(stderr)
if __name__ == "__main__":
main() And was able to run with no errors: ./run-test.py -P 10 -n 600 -x cargo test --package datafusion --test fuzz -- fuzz_cases::sort_query_fuzz::sort_query_fuzzer_runner --exact --show-output I'm running a 1200 run to confirm now. |
I understand why but I do find it kind of strange that |
sort_query_fuzzer_runner
LGTM. IDK if there's a precedence to formatting the error case as an empty string elsewhere in Datafusion. It seems like |
I think @adriangb also fixed this in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code and fix looks good to me. Thank you for tracking this down @adriangb and @AdamGS
I am trying to verify that I can reproduce the error locally but so far I can't cause an error with main nor this branch. I'll report back if I am able to
Here is my reproducer (not as fancy as what you used)
set -e
for i in `seq 1 100` ; do
echo "*** Iteration $i "
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
cargo test --test fuzz -- sort_query_fuzzer_runner &
wait
done
Should we go ahead and merge and get the test running again (or find out quickly with the many CI runs it's still broken)? Or do you want to wait for your local testing? |
I think we should merge it |
Thank you @adriangb |
Uh oh!
There was an error while loading. Please reload this page.